Back

Metabarcoding and Metagenomics

Pensoft Publishers

Preprints posted in the last 90 days, ranked by how well they match Metabarcoding and Metagenomics's content profile, based on 12 papers previously published here. The average preprint has a 0.00% match score for this journal, so anything above that is already an above-average fit.

1
Is DNA metabarcoding an option for formaldehyde-preserved zooplankton time series?

Albaina, A.; Lanzen, A.; Miguel, I.; Rendo, F.; Santos, M.

2026-02-09 zoology 10.64898/2026.02.06.704415 medRxiv
Top 0.1%
4.4%
Show abstract

The recovery of amplifiable DNA from formaldehyde{square}fixed (FF) zooplankton samples has long been considered unfeasible. Nevertheless, advances in DNA sequencing and methods for retrieving highly degraded genetic material have demonstrated that even million{square}year{square}old samples and FF museum specimens can yield usable DNA. To access the biological information preserved in long{square}term zooplankton time series, we assessed methodologies for extracting amplifiable DNA from community samples stored for up to 28 years in formaldehyde at room temperature. On one hand, we report the failure of a method previously described as successful for FF zooplankton samples, likely due to the cold{square}storage conditions (4{square}{degrees}C) used in the original study. On the other hand, by adapting two extraction protocols designed for FF museum specimens--representing harsher and softer alternatives (HHA and HPC, respectively)--we successfully amplified and sequenced a subset of FF zooplankton samples. As expected, DNA integrity and sample pH were inversely related to preservation time, and only short DNA fragments were recovered, ruling out the use of commonly employed [≥]300{square}bp metabarcoding markers. While DNA integrity appeared to be a better predictor than DNA yield for amplification success, the presence of a gel band of the expected size did not always guarantee congruence with microscopy{square}based assessments. Although amplifiable DNA was recovered from most samples, including some of the oldest, community compositions concordant with microscopy were consistently recovered only from samples preserved for up to two years. Beyond this point, the HHA and HPC methods produced divergent results, reflecting a trade{square}off between the removal of formaldehyde{square}induced cross{square}linkages and the avoidance of additional DNA damage. Among the small universal markers tested ([~]120-170{square}bp), including one nuclear rRNA marker and two mitochondrial markers, only the 18S rRNA V9 region consistently amplified. We conclude by providing a set of recommendations aimed at improving the methods presented here.

2
Unveiling protist composition and diversity patterns with eDNA metabarcoding: comparing short- and long-read approaches

SKOUROLIAKOU, D. I.; Dupont Valcy, D. W. E.; Yelle, V.; D'hont, S.; Sabbe, K.; Schon, I.

2026-02-09 ecology 10.64898/2026.02.07.704525 medRxiv
Top 0.1%
2.8%
Show abstract

Environmental DNA (eDNA) metabarcoding is a key tool in biodiversity monitoring due to its high-throughput, non-destructive nature. While short-read (SR) sequencing platforms such as Illumina Miseq have been routinely used in environmental monitoring, their limited read lengths (less than 600 bp) constrain the depth of taxonomic assignment, particularly for complex microbial eukaryotes like protists. Conversely, long-read (LR) sequencing technologies like Oxford Nanopore Technologies (ONT) offer promising alternatives but remain underutilized for studying protist communities. We conducted a comparative study of SR versus LR metabarcoding of protist communities along a coastal-offshore gradient in the Belgian part of the North Sea. Using amplicons targeting the V4 region (SR; 577 bp) and the V4-V5 region (LR; 745 bp) of the 18S rRNA gene, we compared diversity patterns, taxonomic assignment, and community composition between approaches. We observed general congruence in community composition at higher taxonomic levels, but under the applied workflows, LR metabarcoding yielded a greater depth of taxonomic annotation at lower taxonomic ranks. Notably, dinoflagellates were less overrepresented in LR data, and a unique detection of potential nuisance taxa (e.g., Bellerochea), and ecologically important genera such as haptophytes (e.g., Gephyrocapsa) was achieved. These results highlight the potential of LR metabarcoding to complement SR approaches by providing increased taxonomic annotation depth and ecological insights. Although both methods targeted only partial regions of the 18S rRNA gene, LR metabarcoding yielded a greater depth of taxonomic assignment under the applied workflows. As next-generation sequencing technologies continue to evolve, our research provides valuable insights for selecting optimal strategies in routine plankton monitoring and biodiversity assessment programs.

3
State of mangrove biodiversity assessment in Kenya and the prospect of environmental DNA in strengthening surveys

Fredrick Onyango, O.; Okello, J. A.; Muchiri, Z.; Mwamburi, S. M.; Labatt, C.; Owiro, E. O.; Cherono, S.

2026-03-17 molecular biology 10.64898/2026.03.14.711771 medRxiv
Top 0.1%
2.5%
Show abstract

Assessing and monitoring biodiversity in mangrove ecosystems remains challenging, with most studies relying on proxy indicators to infer biodiversity status. This limit understanding of biodiversity dynamics and constrains evidence-based mangrove management. In the Western Indian Ocean region, biodiversity assessments in mangrove forests remain scanty, with no clear information on spatiotemporal and taxonomic coverage. Addressing these gaps requires examining existing biodiversity records and exploring complementary approaches that can broaden the scope and efficiency of biodiversity monitoring. This study assessed the current state of biodiversity assessments in mangrove forests in Kenya and evaluated the feasibility of environmental DNA (eDNA) as a complementary biodiversity monitoring tool. A systematic literature review was conducted by retrieving published sources from major academic databases using defined search terms to extract and compile taxonomic information. In addition, a snapshot eDNA survey was carried out in selected mangrove forests, where sediment and water samples were collected, processed, and analyzed using established molecular and bioinformatics pipelines. The literature review identified 26 sources documenting biodiversity across 15 mangrove forest areas, with 68% of the studies concentrated in four sites representing about 6% of mangrove cover in Kenya. A total of 1,044 unique taxa belonging to 255 families were identified, with the classes Teleostei, Aves, Chromadorea, and Malacostraca accounting for 84.5% of documented taxa. The eDNA survey detected heterogeneous taxa from multiple ecosystems, including 502 taxa belonging to 305 families. Only 67 families were common to both datasets, highlighting the complementarity of literature-based inventories and eDNA detection. While eDNA showed considerable potential to expand biodiversity detection, its application is constrained by a number of factors. Integrating eDNA as a core biodiversity monitoring tool in mangroves will require combining conventional surveys with molecular tools, developing curated regional DNA reference databases, and adopting standardized analytical frameworks.

4
On-site metabarcoding analysis of environmental DNA samples

Mauvisseau, Q.; Ewer, I.; Blumeris, I.; Iren Bongo, S.; Filipe Brito de Oliveira, L.; Gouvea, B.; Carolina Cei, A.; Ferreira Rodrigues, K.; de Arruda Francisco, J.; Sletteng Garvang, E.; Marena do Rego Henriques, V.; Hurtado Solano, S.; Kvalheim, L.; Kaylynne Lawrence, S.; Ramalho Maciel, B.; Isanda Masaki, H.; Fortunate Mashaphu, M.; Masimula, L.; Prudent Mokgokong, S.; Katrin Onshuus, E.; Lima Paiva, B.; Parker-Allie, F.; Du Plessis, M.; Puzicha, M.; Gabriel Da Silva Solano Reis, O.; Speelman, G.; Moritz Splitthof, W.; Stocco de Lima, A. C.; Strindberg, H.; Smoge Saevik, O.; Tafjord, N. J. D

2026-03-30 ecology 10.64898/2026.03.27.714757 medRxiv
Top 0.1%
1.9%
Show abstract

Environmental DNA metabarcoding is a powerful monitoring tool for assessing aquatic biodiversity, as well as the sustainability and impacts of fisheries and aquaculture. However, conventional laboratory workflows remain time-consuming and dependent on dedicated infrastructures. Here, we present a field trial of a fully portable, off-grid eDNA metabarcoding pipeline that enables end-to-end analysis within a few days using compact equipment, including a BentoLab workstation and an Oxford Nanopore Technologies (ONT) MinION sequencer. The workflow was implemented during two international training courses in Norway and Brazil, where students and early career researchers collected environmental samples, extracted and amplified DNA, prepared DNA libraries, and sequenced on-site before performing bioinformatics and statistical analyses. In the case study detailed here, seven eDNA samples collected and analysed on-site in the Oslofjord allowed detection of 16 fish and elasmobranch species. Although overall diversity was lower than in earlier studies using Illumina-based sequencing, our protocol reliably detected key species and demonstrates that portable eDNA metabarcoding is feasible for rapid ecological assessment, surveillance of high-risk regions and/or deployment in remote or resourcelZllimited settings.

5
Variable performances of commercial eDNA inventories challenge their use for surveying stream fish communities

Roussel, J.-M.; Quemere, E.; Bonnet, B.; Covain, R.; Dezerald, O.; Lassalle, G.; Le Bail, P.-Y.; Petit, E. J.; Pottier, G.; Quartarollo, G.; Vigouroux, R.; Lalague, H.

2026-03-17 ecology 10.64898/2026.03.15.711554 medRxiv
Top 0.1%
1.8%
Show abstract

O_LIEnvironmental DNA (eDNA) metabarcoding of water samples is increasingly used to detect fish species in streams. Several studies have concluded that it can outperform traditional inventory methods and recommend using it at large scales for fish-based ecological assessments. However, there is no standard protocol that can guarantee sufficient detection rates and repeatability, despite companies offering an extensive range of analyses. C_LIO_LIWe compared eDNA metabarcoding performed by four companies. Following their guidelines, samples were collected in a small tropical stream in the Maroni River (French Guiana) that hosts a species-rich fish community. We compared their inventories to each other and to a list of species captured during an extensive fish inventory performed immediately after sampling eDNA, as well as to current data on the species distributions. C_LIO_LIThe number of species detected by eDNA metabarcoding ranged from 5 to 48 among the companies, but these inventories contained many inaccuracies. All companies combined, 63 species were detected, of which 10 (16%) had never been reported in the Maroni River. The extensive inventory identified 50 species in the local fish community, of which 16-46 were not detected by eDNA metabarcoding (i.e. false negative detection rate of 32%-92% among the companies). C_LIO_LIReanalysis of raw sequencing data decreased differences among companies greatly, highlighting the importance of using a comprehensive and accurate DNA barcode database to assign species. Dissimilarity indices, calculated to compare the local fish community (based on presence/absence or fish catches) to eDNA detection, revealed large differences regardless of the company. C_LIO_LISummary and applications. The large percentage of species not detected by eDNA metabarcoding of water samples could strongly bias fish-diversity inventories in streams that host species-rich communities. This issue is not well documented in the literature, and we recommend that similar studies in the future focus on other stream contexts. The large differences between commercial eDNA inventories and the local fish community challenge the use of eDNA metabarcoding for fish-based ecological assessments of streams. The variable performance of eDNA companies indicates the need for a standard protocol and access to a comprehensive DNA database before beginning large-scale eDNA programmes. C_LI Highlights- eDNA metabarcoding of water samples is widely used to detect species in streams - Detection performances of 4 private companies were compared to an exhaustive fish inventory - The number of undetected species varies from 32 to 92% depending on the company - Such discrepancies challenge the use of eDNA for fish-based ecological assessments

6
Environmental DNA/RNA metabarcoding in estuaries of Sao Paulo, Brazil, reveals fish diversity and the presence of invasive species

Nitzsche, N. M.; Mota, A. P.; Chen, T.; Nogueira, M. G.; Nogueira, E. J.; Sales, N. G.; Hilario, H. O.; Pinhal, D.

2026-02-15 ecology 10.64898/2026.02.13.705801 medRxiv
Top 0.1%
1.6%
Show abstract

Tropical estuaries within the Brazilian Atlantic Forest are biodiversity hotspots facing escalating anthropogenic pressures, yet their ichthyofaunal assemblages remain incompletely documented. We evaluated the combined use of environmental DNA (eDNA) and environmental RNA (eRNA) metabarcoding to characterize fish communities in two estuaries with contrasting levels of urbanization (the Juqueriquere and Escuro rivers) on the northern coast of Sao Paulo, Brazil. Targeting the mitochondrial 12S rRNA (MiFish) fragment, we detected a diverse vertebrate assemblage totaling 93 species. eDNA identified 32 fish species across both systems, while eRNA detected 22 species in the preserved estuary, providing robust signals of metabolically active assemblages. The less impacted estuary exhibited significantly higher diversity indices and a more heterogeneous taxonomic composition. In contrast, the urbanized system displayed clear molecular signatures of anthropogenic influence, including the presence of invasive species (Oreochromis niloticus, O. aureus, and Clarias gariepinus) and domestic animals. This study constitutes the first application of fish eRNA metabarcoding in Brazil and demonstrates that integrating eDNA and eRNA refines ecological interpretation by coupling biodiversity detection with improved inference about contemporary community composition. Our findings highlight the potential of multi-molecule metabarcoding for routine, non-invasive biodiversity assessment in megadiverse and conservation-priority coastal ecosystems.

7
Revisiting the genetics of Lake Constance Coregonids using lake-wide whole genome sequencing

Jacobs, A.; Roch, S.; Roberts, B.; Capstick, M.; Brinker, A.

2026-01-18 ecology 10.64898/2026.01.18.700192 medRxiv
Top 0.1%
1.6%
Show abstract

Anthropogenic pressures can have detrimental impacts on fish populations, with their effective management and conservation requiring accurate monitoring tools. Yet, this is not straightforward for closely-related, co-existing species that are difficult to distinguish using simple phenotypic or genetic approaches. Coregonids are of cultural and economic importance across Europe but have faced a multitude of pressures over the last century. Yet genomic management tools are lacking. In Lake Constance, a large pre-alpine lake, stocks have drastically collapsed due to a multitude of pressures, leading to a fishery closure. Here, we adopt a cost-effective, whole genome sequencing approach for lake-wide assessment of stock composition, spatial distribution and genetic diversity of highly admixed Lake Constance whitefish (Coregonus spp.). By sequencing 983 adult and larval genomes, we show that nearly 90% of the stock is made up by one of three species, the Gangfisch (C. macrophthalmus), and define the genetic relationship between Upper and Lower Lake Constance whitefish stocks. We also identified strong mixing between Gangfisch and Blaufelchen (C. wartmanni) on traditionally specific-specific spawning grounds, and detected strong admixture in larvae, with potentially drastic impacts on the effectiveness of hatchery supplementation and stocking. Despite the collapse and admixture, species still exhibit low to moderate levels of genetic diversity, maintain ecologically-relevant genetic differences, and seem to show differences in habitat use. Overall, we present a cost-effective, translatable tool for stock-wide sequencing and genetically-informed fisheries management, with our results calling for the re-evaluation of current management practices to avoid the potential genetic mixing between species.

8
Cryptic diversity in Astyanax (Characiformes: Acestrorhamphidae) from the Magdalena basin, Colombia: Insights from molecular and morphometric evidence

Marquez, E. J.; Garcia-Castro, K. L.; Alvarez, D. R.; DoNascimiento, C.

2026-03-31 genetics 10.64898/2026.03.28.714954 medRxiv
Top 0.1%
1.5%
Show abstract

Astyanax Baird & Girard, 1854 is a widely distributed and species-rich genus of Acestrorhamphidae, whose abundant populations in Neotropical basins play a crucial ecological role at the trophic level. Taxonomic uncertainties persist within the genus, as seen in Astyanax sp. (formerly designated as A. fasciatus) from the Magdalena basin in Colombia. Concerns about its genetic status are heightened due to ecological threats posed by hydroelectric dams, from habitat loss to river connectivity. We isolated and characterized 17 microsatellite loci to assess the population genetics of this species in a broad sample from the middle and lower sections of the Cauca River, now interrupted by the Ituango dam. Furthermore, a multidisciplinary approach integrating phylogenetic analyses of mitochondrial (COI) and nuclear (rag2) markers with geometric morphometric analyses was employed to evaluate potential cryptic diversity within Astyanax sp. Microsatellites revealed two genetic groups in the studied area, strongly supported as distinct lineages by phylogenetic analyses. Unexpectedly, one of these lineages of Astyanax sp. was recovered in an unresolved clade with samples of A. microlepis and allopatric samples of A. viejita from the Maracaibo Lake basin. Each genetic group showed high genetic diversity, but also evidence of recent bottleneck events and significant-high values of inbreeding. Morphometric analyses provided evidence of significant phenotypic differentiation among A. microlepis, Astyanax sp. 1 (Asp1), and Astyanax sp. 2 (Asp2). Morphological patterns ranged from the robust profile of A. microlepis to the streamlined shape of Astyanax sp. 2 (Asp2), with Astyanax sp. 1 (Asp1) displaying intermediate traits and localized differences in head length and fin placement. Statistical support from permutation tests and a high overall classification accuracy (95.65%) underscore the existence of distinct morphospecies, suggesting that phenotypic differentiation is well-established, despite the complex evolutionary history of the group. This study suggests the presence of cryptic diversity within Astyanax sp. and provides valuable genetic information for the conservation and management of their populations in the Magdalena basin.

9
Evaluation of Protein Reference Database Reduction and Its Impact on Peptide-Centric Metaproteomics

Vande Moortele, T.; Van de Vyver, S.; Binke, B.-B.; Van Den Bossche, T.; Dawyndt, P.; Martens, L.; Mesuere, B.; Verschaffelt, P.

2026-02-25 bioinformatics 10.64898/2026.02.24.707692 medRxiv
Top 0.1%
1.0%
Show abstract

Introduction/BackgroundRecent large-scale restructurings of UniProtKB included removal of redundant entries, exclusion of taxonomically unclassified organisms, and a shift toward a more reference-proteome-centered approach. This raised concerns about the stability of peptide-centric metaproteomics workflows. In parallel, metagenomics-assisted "targeted" database restriction is often proposed to reduce ambiguity, but its net impact on peptide-centric interpretation remains unclear. MethodsWe assessed the impact of three complementary factors on the taxonomic profiling of metaproteomics analyses: (i) successive global UniProtKB reductions, (ii) metagenomics-derived targeted database restriction, and (iii) Unipepts internal taxon validation filter. Peptide lists from two public metaproteomics datasets (human gut and marine hatchery) were analysed with Unipept and compared across sequential UniProtKB configurations and custom SSU/LSU-derived filtered databases. ResultsAcross both environments, progressive UniProtKB downsizing reduced peptide coverage, did not fundamentally alter the most abundant taxa, and substantially lowered ambiguous root-level assignments. This suggests that the reduction in ambiguity stemmed from decreased redundancy, rather than a loss of meaningful biological information. Metagenomics-assisted targeted filtering introduced a clear trade-off: it markedly reduced peptide matches, but with only modest changes in resolution at lower taxonomic ranks. It, however, consistently reduced non-specific root-level assignments. The effects on taxon discoverability and relative abundances was heavily dependent on the environment, with stronger shifts observed in the, lesser represented, marine dataset. Finally, the added benefit of Unipepts internal taxon validation filter decreased across newer, more curated database configurations. It had the largest impact on older, more inclusive releases and became minimal under the reference-proteome-focused setup. Discussion/ConclusionOverall, UniProtKB restructuring does not destabilize peptide-centric metaproteomic analyses. Instead, it tends to reduce ambiguity while preserving high-level community structure. Targeted database restriction offers a trade-off between sensitivity and reduced ambiguity in a strongly context-dependent manner. As UniProtKB becomes increasingly more curated and reference-proteome-centered, the need for additional internal taxonomic filtering in Unipept appears to diminish.

10
Barcode Crosstalk in ONT Multiplex Sequencing: Quantification and Mitigation Strategies

Scharf, S. A.; Spohr, P.; Ried, M. J.; Haas, R.; Klau, G. W.; Henrich, B.; Pfeffer, K.

2026-03-28 molecular biology 10.64898/2026.03.27.714689 medRxiv
Top 0.1%
0.9%
Show abstract

Multiplexing samples in long-read sequencing with Oxford Nanopore Next Generation Sequencing Technology (ONT) by ligating specific native barcodes to individual DNA samples enables significant increases of high throughput sequencing combined with a significant reduction of sequencing costs. However, this advantage carries the risk of barcode misassignment / crosstalk. Employing ONT multiplex sequencing with samples, we observed misassigned barcodes so called barcode crosstalk, after ONT library preparation according to the standard protocol, particularly in samples with low input DNA concentrations. We assumed that these barcode misassignments are largely due to misligation of remaining native barcodes during subsequent the subsequent sequencing adapter ligation. To systematically investigate and quantify barcode crosstalk, genomic DNA (gDNA) from four bacterial type strains with different DNA input concentrations was prepared using three protocols for library preparation: the Nanopore standard protocol (protocol A: version valid until July 2, 2025) the new Nanopore protocol (protocol B: version from July 2, 2025), and an in house protocol with pooling of the barcoded samples only after the sequencing adapter ligation step (protocol C: in house). All samples were sequenced on a Nanopore PromethIon device. The results clearly showed that the use of protocol A resulted in a pronounced barcode crosstalk especially detectable in samples with low DNA input concentrations (up to 2.4% misassigned reads). The ONT adjustment in protocol B (altered washing buffer vs. protocol A) significantly alleviated the barcode crosstalk to below 0.01%, whereas protocol C eliminated barcode crosstalk virtually completely. These observations emphasize that sequencing results obtained with older ONT native barcoding protocol variants should be critically reviewed. The newer ONT barcoding protocol is preferable for sequencing, but it does not completely eliminate the barcode crosstalk effect. In conclusion, for low DNA input and high accuracy sequencing, protocol C is recommended.

11
Environmental DNA as an Indicator of Seasonal Reproductive Phenology in Freshwater Mussels

Marshall, N.; Dean, C.; Sierra, M.; Fleece, W. C.

2026-02-20 ecology 10.64898/2026.02.19.706874 medRxiv
Top 0.1%
0.9%
Show abstract

Unionid freshwater mussels exhibit a unique form of mitochondrial inheritance, termed doubly uniparental inheritance, in which a maternal and a paternal mitotype is transmitted uniparentally. The exclusive presence of a male mitotype in gonadal tissue and sperm cells suggests that environmental DNA (eDNA) could serve as a non-invasive method for monitoring freshwater mussel reproduction. Yet, the dynamics of male mitotype detection within the environment remain poorly understood. This study analyzed seasonal eDNA samples from two diverse mussel beds, detecting 24 mitochondrial operational taxonomic units (MOTUs) associated with the male mitotype. Peaks in male mitotype signal for mussels identifiable to the species level generally aligned with expected spawning periods based on female gravidity records (e.g., Pyganodon grandis, Lasmigona costata, Ortmaniana ligamentina). Additionally, male mitotype detection was often sporadic compared to the consistently detected female mitotype, indicating that male signals may be tied to behavioral or reproductive events rather than continuous shedding. While elevated male signals may reflect spawning, alternative sources such as tissue decay, mitotype leakage, glochidia release, or post-spawning gamete clearance complicate interpretation. A male-to-female mitotype ratio is proposed as a more reliable proxy for identifying sperm release events, given the high concentration of male mitotypes that occurs within spermatozeugmata. Limitations in male mitotype reference databases hindered species-level resolution for many MOTUs, underscoring the need for expanded genomic resources. Overall, this work demonstrates that male mitotype eDNA likely provides valuable insights into mussel reproductive ecology, while emphasizing the importance of long-term monitoring and integrated gametogenesis studies to refine its application in conservation.

12
Species-specific versus community-wide assays in eDNA monitoring of European eel Anguilla anguilla: Trade-offs between detection sensitivity and the value of additional community data

Monaghan, A. I. T.; Sellers, G. S.; Griffiths, N. P.; Lawson Handley, L.; Hänfling, B.; Macarthur, J. A.; Wright, R. M.; Bolland, J. D.

2026-03-20 ecology 10.64898/2026.03.19.712641 medRxiv
Top 0.1%
0.9%
Show abstract

Effective monitoring of the critically endangered European eel (Anguilla anguilla) is essential for conservation planning and regulatory decision-making, particularly in heavily fragmented rivers. Environmental DNA (eDNA) methods offer sensitive alternatives to traditional surveys, but there is uncertainty around whether targeted assays or community-wide approaches are better suited to achieve monitoring objectives. We compared eDNA metabarcoding and species-specific quantitative PCR (qPCR) for detecting A. anguilla across 145 pumped catchments in the Fens, East Anglia, England. All sites were sampled once initially, and sites negative for A. anguilla were re-sampled based on metabarcoding results. This allowed comparison of detection rates from a single water sample and site-level retrospective identification of sites where qPCR could have identified A. anguilla in earlier samples. The findings were also set in the context of the wider biodiversity information generated by metabarcoding. From the initial (single) water sample, qPCR detected A. anguilla at seven more sites than metabarcoding (17 versus 10). With repeated sampling, metabarcoding detected A. anguilla at 43 sites, including all but one of the sites where qPCR detected A. anguilla, and ten sites where qPCR did not detect A. anguilla within the same number of samples. Indeed, the additional sampling effort required to detect A. anguilla with metabarcoding at sites also positive with qPCR was small relative to the overall sampling effort. Furthermore, metabarcoding additionally detected 28 non-target fish species alongside fish, amphibian and mammal species of conservation concern. Our results highlight trade-offs between target-species sensitivity and the broader ecological information provided by each method, and support metabarcoding as an effective tool for a holistic conservation approach, with the additional community data outweighing the marginally increased sensitivity of qPCR.

13
Hunting for Helminths: short- and long-read shotgun metagenomics for parasite detection in faecal samples

O'Brien, K.; Elamaran, A.; Dayi, M.; Keeling, G.; Nevin, W. D.; Liu, Y.; Viney, M.; Reynolds, K.; Bishop, C.; Sripa, B.; Woubshete, M.; Sachs Nique, P.; Wright, R.; Younger, J.; Hunt, V. L.

2026-03-10 molecular biology 10.64898/2026.03.09.710549 medRxiv
Top 0.1%
0.8%
Show abstract

Soil-transmitted helminths (STHs) pose significant challenges to public health in endemic areas, necessitating reliable methods for their detection. Shotgun metagenomics enables simultaneous detection of STHs and microbes in a sample without prior knowledge of what is present. However, validation of shotgun metagenomics with known infection intensity or across different sequencing platforms has not been carried out for eukaryote parasites including STHs, and false positives remain a pervasive issue. We validated shotgun metagenomics as a method of STH detection in faecal samples. Using the Strongyloides ratti laboratory model of a STH infection we investigated how analytical methods (nucleotide-nucleotide matching, nucleotide-protein matching, marker gene detection, mitochondrial mapping), infection intensity and sequencing technology (short-read vs. long-read) affects sensitivity and specificity of detection. S. ratti was accurately detected at a standard laboratory dose, but low intensity infections were more difficult to detect. Only mitochondrial sequence mapping was 100% accurate at identifying S. ratti with no false positives. Overall, short-read outperformed long-read sequencing methods. We applied the same analytical methods to human faecal samples with confirmed infections for at least one of four STHs. Mitochondrial sequence mapping was also the most effective method for detecting STHs in human faecal samples, detecting 100% of Necator americanus and 92% of Ascaris spp. infections, but could not reliably detect STHs where DNA levels are expected to be low or variable. In conclusion, mitochondrial mapping was the most effective method of detection for sensitivity and specificity in both the laboratory system and human faecal samples. Our findings indicate that shotgun metagenomics should be approached cautiously using validated methods, particularly when infection intensity or DNA levels are expected to be low. Author SummarySoil-transmitted helminths (STH) such as the parasite Strongyloides, are important gastrointestinal parasites of humans and livestock. Accurate methods of detection for diagnostics and monitoring are important to implement suitable control and treatment strategies. Here we validate a shotgun metagenomics approach, where all DNA in a sample is sequenced, for detecting STH in faecal samples using a Strongyloides laboratory model for infection. Strongyloides was reliability detected in faecal samples at higher infection levels, but mitochondrial genome mapping of the sequences was the only analytical method that reliably detected Strongyloides at lower infections levels. These results were reflected in stool samples from humans infected with STH, where mitochondrial mapping was also the most reliable method. However, species that were associated with low levels of parasite material or DNA in the faeces including Strongyloides stercoralis, were more difficult to detect. We compared two sequencing methods: short-read Illumina and long-read Oxford Nanopore Technologies, but short-read outperformed long-read shotgun metagenomics. Contamination of bacteria sequences in parasite genome assemblies was problematic for analysis and contributed to false positive results. Future work should focus on specific targeting of eukaryote DNA either at the laboratory or bioinformatic stage to improve STH detection further.

14
Deciphering the genetic basis of phytoplankton traits through genome-wide association studies

Maupetit, A.; Segura, V.; Pajot, A.; Nicolau, E.; Bougaran, G.; Lacour, T.; Berard, J. B.; Charrier, A.; Schreiber, N.; Robert, E.; Saint-Jean, B.; Carrier, G.

2026-02-27 genetics 10.64898/2026.02.27.708454 medRxiv
Top 0.1%
0.8%
Show abstract

Recently, an inventory of genes in phytoplankton was conducted through expeditions such as TARA Oceans. Approximately 1.5 million genes were identified, of which at least three-quarters have unknown function. Presently, a several research programmes are engaged in the sequencing of marine biodiversity, resulting in a rapid expansion of genomic databases. Access to the genomic sequences of these organisms will soon be readily accessible to the scientific community. Although analysing this data is promising, the characterization of genes or genomes, on the other hand, is progressing very slowly and remains a major challenge for scientists. The aim of this study was to use GWAS approaches to decipher genomic loci without a priori assumptions. The microalga Tisochrysis lutea was selected as a case study due to its economic importance and the extensive knowledge accumulated over the years. Particular attention was paid to pigment and lipid metabolism due to their high commercial value. To implement the GWAS approach, a collection of algal lineages was established (100 lineages) from available polyclonal strains (15 strains). This collection was then phenotyped under two different culture conditions. Of the 31 phenotypic traits investigated, 18 met the requirements for GWAS analysis. Concurrently, each algal lineage was genotyped by whole genome sequencing to inventory all genetic polymorphisms. A mixed model was applied, revealing 13 significant associations between phenotypic traits and alleles. These associations highlight previously unsuspected genomic loci that play a major role in pigment or lipid content. Genes identified at these loci may have a direct or indirect role in these metabolic pathways. Nevertheless, elucidating the molecular mechanisms of the associated genes remains limited without the implementation of functional approaches. Despite the complexity of the process, we conclude that the GWAS approach was effective for deciphering phytoplankton genomes, particularly for quantitative traits of interest. Ideally, this approach should be combined with other functional methods to progressively decode marine genomes.

15
A conservation planning assessment of basin wide Unionid mussel assemblages using environmental DNA

Marshall, N. T.; Seymour, M.; Herbert, N.; Dean, C.; Fleece, W. C.

2026-02-16 ecology 10.64898/2026.02.13.705757 medRxiv
Top 0.1%
0.8%
Show abstract

Conservation planning for rare, threatened, and endangered species requires basic information for distribution and abundance. Often this information is lacking due to the nature of traditional survey methods which can be time and labor intensive and thus costly. Environmental DNA (eDNA) metabarcoding offers a promising approach for monitoring freshwater mussel assemblages, a taxonomic group that is both highly imperiled and difficult to survey using traditional methods. We evaluated the performance of eDNA metabarcoding across 30 km of Fish Creek, in Ohio and Indiana, U.S.. We compared results to visual surveys conducted at the same sites. eDNA detected 25 mussel species, including four species not observed alive visually, while visual surveys detected 22 live species. Both methods confirmed the presence of three federally protected species, and eDNA uniquely detected Simpsonaias ambigua, a species rarely encountered in conventional surveys. Incorporating detection repeatability improved congruence between methods: high-repeatability detections strongly aligned with visual presence, whereas moderate and low repeatability detections likely represented reach-scale occupancy. Overall, eDNA metabarcoding offers an efficient and sensitive tool for assessing mussel assemblages and can substantially enhance monitoring programs when integrated with species ecology and hydrological context.

16
Resolving eukaryotic river biofilm communities using long-read sequencing for biomonitoring

Anderson, M. A. J.; Read, D. S.; Thorpe, A. C.; Bhanu Busi, S.; Warren, J.; Walsh, K.

2026-02-20 molecular biology 10.64898/2026.02.20.706759 medRxiv
Top 0.1%
0.8%
Show abstract

Freshwater biofilms host diverse microbial eukaryotic communities that are central to ecosystem functioning and serve as key indicators of water quality. Molecular biomonitoring approaches based on environmental DNA (eDNA) sequencing are increasingly used to characterise these communities, offering scalable alternatives to traditional microscopy-based assessments. Understanding how DNA sequencing methods influence the observed community composition and diversity is essential for ensuring accurate ecological interpretation. Here, we compared short-read Illumina and long-read Pacific Biosciences sequencing of the 18S rRNA gene, alongside a trimmed long-read dataset (restricted to the Illumina-primed region), to evaluate how read length and sequencing platform affect community profiling in river biofilms from seven English rivers sampled across three timepoints. Distinct community patterns were observed between the sequencing approaches, with PERMANOVA revealing significant differences in beta diversity (p = 0.001) and modest effect sizes (R2 = 3.8-8.3%). While the long and trimmed datasets produced nearly identical community structures, both diverged strongly from the short-read data, suggesting that short-read sequencing captures a systematically different subset of taxa than long-read sequencing. Long-read sequencing significantly improved taxonomic resolution of the 18S rRNA gene, particularly at the genus and species levels, enabling detection of lineages that were unresolvable in short-read data. However, comparisons of paired long- and trimmed-read ASVs indicated that trimming can increase taxonomic mismatches at finer ranks, likely due to reduced sequence length rather than sequencing platform bias. Collectively, our results demonstrate that sequencing strategy significantly influences inferred community composition and taxonomic precision. Long-read sequencing provides a more robust representation of community diversity, whereas trimmed analyses reveal how shorter amplicons may contribute to misidentification. These findings emphasise the importance of considering read length when interpreting eDNA-based assessments using the 18S rRNA gene and support the adoption of long-read sequencing for high-resolution biomonitoring applications.

17
Nanopore sequencing reaches amplicon sequence variant (ASV) resolution

Riisgaard-Jensen, M.; Villanelo, S. A. R.; Andersen, K. S.; Kirkegaard, R.; Hansen, S. H.; Jiang, C.; Stefansen, A. V.; Thomsen, J. H. D.; Nielsen, P. H.; Dueholm, M. K. D.

2026-02-28 bioinformatics 10.64898/2026.02.26.708165 medRxiv
Top 0.1%
0.8%
Show abstract

Sequencing of ribosomal marker genes remains a cornerstone for profiling complex microbial communities. In recent years, there has been a shift from Illumina to long-read technologies, including PacBio and Oxford Nanopore Technologies (ONT). ONT is attractive due to its low startup cost and portability; however, historically high error rates have prevented direct amplicon sequencing variant (ASV) generation from raw nanopore reads. This has forced most workflows to rely on mapping raw reads against reference databases constraining analyses to taxa covered by these. With recent improvements in ONT sequencing accuracy, we sought to challenge this view by sequencing samples of increasing complexity using primer sets targeting amplicons of different lengths, and by sequencing the exact same PCR libraries on both PacBio and ONT. We demonstrate that error-free ASVs can now be generated directly from raw nanopore reads using standard denoising algorithms originally developed for Illumina data. Current ONT read quality enables reliable reconstruction of amplicons spanning [~]250 bp to [~]4,200 bp and allows resolution of intragenomic rRNA gene variants. These results extend beyond simple mock communities to complex fecal, anaerobic digester, activated sludge, and soil samples. When sequencing depth is sufficient, ONT accurately recovers all or nearly all intra-genomic 16S rRNA gene copy variants, showing perfect sequence identity to curated reference sequences in mock communities and to ASVs inferred from PacBio data in complex communities. Across the primer sets, ONT required higher sequencing depth than PacBio to fully resolve the communities, with this requirement increasing with amplicon length. For complex samples, ONT required approximately 2-3x more reads for V4 ([~]250 bp) and V1-V3 ([~]500 bp), 4.1-5.6x more reads for V1-V8 ([~]1400 bp), and 25-42x more reads for rRNA operon (OPR) amplicons ([~]4200 bp). Consequently, sequencing complex communities with OPR primers on ONT is currently not feasible due to the unrealistically high read depth required. This study provides evidence that ONT amplicon sequencing has matured to the point where true ASV-resolved profiling is practically and economically feasible, moving ONT amplicon analysis beyond reliance on OTU clustering or reference alignment to enable application in environments lacking comprehensive reference databases. Key FindingsO_LIIt is now straightforward to generate ASVs on ONT platforms (250-4200 bp) C_LIO_LIONT can resolve intragenomic 16S rRNA gene variants C_LIO_LIASV recovery is successful in both simple and complex communities C_LI

18
How many are you? Open data and bioinformatics reveal species misidentification and potential introgression in Chordodes (Phylum Nematomorpha)

De Vivo, M.

2026-02-05 bioinformatics 10.64898/2026.02.03.703548 medRxiv
Top 0.1%
0.7%
Show abstract

The potential usage of genomic open data can help us to understand patterns in biodiversity. They can also be helpful for identifying morphologically similar species. An example of taxon in which this can be useful is Nematomorpha, one of the less studied animal phyla, for which data has started to be available recently and where species identification can be hard. In this study, I planned initially to evaluate the usage of mitochondrial data for population analyses using an RNA sequencing (RNA-seq) dataset labelled as belonging to Chordodes fukuii. After surprising results using extracted sequences from the barcoding gene cytochrome c oxidase subunit I (COXI), I evaluated species delimitation using a mix of a previously released double-digest restriction-site-associated DNA sequencing (ddRADseq) SRA dataset plus the RNA-seq one. PCA, R analyses through "adegenet" and ADMIXTURE confirmed the presence of two species in the RNA-seq dataset, which should be labelled as C. formosanus and C. japonensis; however, some individuals labelled as C. japonensis according to COXI clustered with C. formosanuss specimens or had some C. formosanus ancestry when more data was used, indicating potential introgression or incomplete lineage sorting. The study shows how previously released data can be used for evaluating species delimitation, potential previous demographic events and potential needs in DNA barcoding and genomics for avoiding future misidentification of morphologically similar species.

19
A Permutation-Based Framework for Evaluating Bias in Microbiome Differential Abundance Analysis

Zeng, K.; Fodor, A. A.

2026-03-18 bioinformatics 10.64898/2026.03.14.711836 medRxiv
Top 0.1%
0.7%
Show abstract

BackgroundIn microbiome research, differential abundance analysis aids in identifying significant differences in microbial taxa across two or more conditions. Statistical approaches used for this purpose include classical tests such as the t-test and Wilcoxon test, as well as methods designed to account for the compositional nature of microbiome data, including ALDEx2, ANCOM-BC2, and metagenomeSeq. In addition, methods originally developed for RNA sequencing data, such as DESeq2 and edgeR, have been frequently applied to microbiome studies. However, the use of these methods has been controversial. One area of concern is whether different modeling frameworks produce accurate p-values when the null hypothesis is true. ResultsWe evaluated eight methods across six publicly available datasets. Four permutation strategies were applied to generate data under the null hypothesis: shuffling sample names, shuffling counts within samples, shuffling counts within taxa, and fully randomizing the counts table. Methods based on the negative binomial distribution (DESeq2 and edgeR) produced p-values that were consistently smaller than expected under the null hypothesis. In contrast, methods that attempt to correct for compositionality (ALDEx2, ANCOM-BC2, and metagenomeSeq) tended to produce larger-than-expected p-values, even when only sample labels were shuffled, a permutation strategy that does not alter compositional structure. These deviations were dependent on dataset characteristics and permutation strategy, suggesting complex interactions between underlying data structure and algorithm performance. Generating data to follow the expected negative binomial distribution did not eliminate the tendency of DESeq2 and edgeR to exaggerate statistical significance. Although similar patterns were observed in RNA sequencing (RNAseq) datasets, the deviations were less pronounced than in microbiome data. In contrast, the classical t-test and Wilcoxon test yielded p-value distributions consistent with theoretical expectations across datasets and permutation strategies. ConclusionsThese results indicate that the performance of several widely used differential abundance methods can be problematic under null conditions and may affect biological interpretation. Our findings emphasize the importance of careful method selection and highlight the robustness of simpler statistical approaches for reliable inference.

20
Beta Diversity Meta-Analysis Shows Transformations Have Broadly Similar Performance in Machine Learning Applications Regardless of Compositional or Phylogenetic Awareness

Fry Brumit, D.; Sorgen, A. A.; Fodor, A.

2026-01-23 bioinformatics 10.64898/2026.01.20.699043 medRxiv
Top 0.1%
0.7%
Show abstract

BackgroundBeta diversity quantifies pairwise differences between two or more communities through matrix transformations, which are either naive to phylogeny or phylogenetically aware. Methods have recently been introduced that also consider compositionality and sparsity and that display an increased magnitude of pseudo-F scores as produced by PERMANOVA to measure effect size. In this study, we ask how transformations that consider phylogeny, sparsity, and compositionality compare to older, simpler methods across five publicly available datasets. ResultsApplication of random forest methods to 107 features across 5 datasets did not yield a consistent increase in classification performance between different beta diversity methods. Limiting datasets to just three eigenvalue decomposition (EVD) axes leads to a small but reliably detectable decrease in performance compared to giving random forest models access to log-normalized or even un-normalized raw count tables. Increasing the number of included EVD axes in classification improves performance across all available models up to [~]10-20 axes. We observed larger variation in PERMANOVA pseudo-F scores for some features associated with phylogenetically and compositionally aware beta diversity algorithms across multiple datasets, but did not find that these improved scores yielded consistently increased resolution or accuracy for machine learning methods. ConclusionsWhile EVD remains an essential technique for dimension reduction, retaining higher-dimensional structures past 3 EVD axes may improve performance. Elevated but insignificant pseudo-F scores may be explained by the higher variance in pseudo-F scores for phylogenetically or compositionally aware methods compared to simpler methods.This indicates that pseudo-F scores are an unreliable overall metric of algorithm performance. Taken together, our results show that choice of beta diversity metric does not yield a substantial difference in effect size or machine learning performance. We conclude that analysts are free to choose appropriate methods for each dataset balancing simplicity vs. corrections for phylogeny, sparsity and compositionality and that these choices are unlikely to impact the overall power and resolution of biological conclusions from microbial data.